Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: has_operation #1807

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

camriddell
Copy link
Contributor

@camriddell camriddell commented Jan 13, 2025

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

Review Items

  • Does this conflict with Narwhals internal stable/version API?
  • Fix the typing here (specified return type is a shallow proxy to the real behavior)

Wanted to get feedback for an idea I had to implement has_operation. The current API implements the following changes

  • custom metaproperty decorator that behaves similarly to property (only the getter syntax other functionality would need to be implemented or would need to inherit from property itself).
    • enables chained syntax to move from an Expression/Series to its constituent namespaces without requiring an instance
    • nw.Expr.dt.date is valid syntax.
  • add has_operation to check if a given function passed using the above syntax is available for any given backend.

What would be nice for this feature would be if these components were more readily accessible from the narwhals namespace, dynamic imports based on internalized mappings can become tricky to maintain (see narwhals/utils:has_operation locals.backend_mapping) Instead these expressions could be added to the Implementation enum providing more informative metadata (such as import location lookups as well as namespace import names).


Feedback on the concept as a whole or the specific implementation is welcome! If @MarcoGorelli likes this approach, I do believe some additional work is required here to get this working correctly with the internal versioning interface so guidance in this spot would be appreciated.

Here are a few examples to show what this PR contains

from narwhals import has_operation
import narwhals as nw
import polars as pl
import pandas as pd
import pyarrow as pa
import duckdb
import dask.dataframe as dd


print(
    f"{has_operation(pl, nw.Expr.mean)                   =  }",  #  True
    f"{has_operation(pl, nw.Expr.dt.date)                =  }",  #  True
    f"{has_operation(pl, nw.Series.mean)                 =  }",  #  True
    f"{has_operation(pl, nw.Series.str.to_uppercase)     =  }",  #  True
    f"{has_operation(pl, nw.DataFrame.explode)           =  }",  #  True
    f"{has_operation(pl, nw.LazyFrame.drop_nulls)        =  }",  #  True
    "",
    f"{has_operation(pd, nw.Expr.mean)                   =  }",  #  True
    f"{has_operation(pd, nw.Expr.dt.date)                =  }",  #  True
    f"{has_operation(pd, nw.Series.mean)                 =  }",  #  True
    f"{has_operation(pd, nw.Series.str.to_uppercase)     =  }",  #  True
    f"{has_operation(pd, nw.DataFrame.explode)           =  }",  #  True
    f"{has_operation(pd, nw.LazyFrame.drop_nulls)        =  }",  #  False
    "",
    f"{has_operation(pa, nw.Expr.mean)                   =  }",  #  True
    f"{has_operation(pa, nw.Expr.dt.date)                =  }",  #  True
    f"{has_operation(pa, nw.Series.mean)                 =  }",  #  True
    f"{has_operation(pa, nw.Series.str.to_uppercase)     =  }",  #  True
    f"{has_operation(pa, nw.DataFrame.explode)           =  }",  #  False
    f"{has_operation(pa, nw.LazyFrame.drop_nulls)        =  }",  #  False
    "",
    f"{has_operation(duckdb, nw.Expr.mean)               =  }",  #  True
    f"{has_operation(duckdb, nw.Expr.dt.date)            =  }",  #  True
    f"{has_operation(duckdb, nw.Series.mean)             =  }",  #  False
    f"{has_operation(duckdb, nw.Series.str.to_uppercase) =  }",  #  False
    f"{has_operation(duckdb, nw.DataFrame.explode)       =  }",  #  False
    f"{has_operation(duckdb, nw.LazyFrame.drop_nulls)    =  }",  #  False
    "",
    f"{has_operation(dd, nw.Expr.mean)                   =  }",  #  True
    f"{has_operation(dd, nw.Expr.dt.date)                =  }",  #  True
    f"{has_operation(dd, nw.Series.mean)                 =  }",  #  False
    f"{has_operation(dd, nw.Series.str.to_uppercase)     =  }",  #  False
    f"{has_operation(dd, nw.DataFrame.explode)           =  }",  #  False
    f"{has_operation(dd, nw.LazyFrame.drop_nulls)        =  }",  #  True
    sep='\n'
)
has_operation(pl, nw.Expr.mean)                   =  True
has_operation(pl, nw.Expr.dt.date)                =  True
has_operation(pl, nw.Series.mean)                 =  True
has_operation(pl, nw.Series.str.to_uppercase)     =  True
has_operation(pl, nw.DataFrame.explode)           =  True
has_operation(pl, nw.LazyFrame.drop_nulls)        =  True

has_operation(pd, nw.Expr.mean)                   =  True
has_operation(pd, nw.Expr.dt.date)                =  True
has_operation(pd, nw.Series.mean)                 =  True
has_operation(pd, nw.Series.str.to_uppercase)     =  True
has_operation(pd, nw.DataFrame.explode)           =  True
has_operation(pd, nw.LazyFrame.drop_nulls)        =  False

has_operation(pa, nw.Expr.mean)                   =  True
has_operation(pa, nw.Expr.dt.date)                =  True
has_operation(pa, nw.Series.mean)                 =  True
has_operation(pa, nw.Series.str.to_uppercase)     =  True
has_operation(pa, nw.DataFrame.explode)           =  False
has_operation(pa, nw.LazyFrame.drop_nulls)        =  False

has_operation(duckdb, nw.Expr.mean)               =  True
has_operation(duckdb, nw.Expr.dt.date)            =  True
has_operation(duckdb, nw.Series.mean)             =  False
has_operation(duckdb, nw.Series.str.to_uppercase) =  False
has_operation(duckdb, nw.DataFrame.explode)       =  False
has_operation(duckdb, nw.LazyFrame.drop_nulls)    =  False

has_operation(dd, nw.Expr.mean)                   =  True
has_operation(dd, nw.Expr.dt.date)                =  True
has_operation(dd, nw.Series.mean)                 =  False
has_operation(dd, nw.Series.str.to_uppercase)     =  False
has_operation(dd, nw.DataFrame.explode)           =  False
has_operation(dd, nw.LazyFrame.drop_nulls)        =  True

@camriddell camriddell changed the title Feat/has operation feat: has_operation Jan 13, 2025
@MarcoGorelli
Copy link
Member

thanks Cam! i'm out teaching this week so have very limited time, but i'll take a look soon!

@MarcoGorelli
Copy link
Member

thanks for doing this!

would we need to remove blanket notimplementederrors like

def join_asof(
self: Self,
other: Self,
*,
left_on: str | None,
right_on: str | None,
by_left: list[str] | None,
by_right: list[str] | None,
strategy: Literal["backward", "forward", "nearest"],
suffix: str,
) -> Self:
msg = "join_asof is not yet supported on PyArrow tables" # pragma: no cover
raise NotImplementedError(msg)

for this to be accurate?

@camriddell
Copy link
Contributor Author

thanks for doing this!

would we need to remove blanket notimplementederrors like

def join_asof(
self: Self,
other: Self,
*,
left_on: str | None,
right_on: str | None,
by_left: list[str] | None,
by_right: list[str] | None,
strategy: Literal["backward", "forward", "nearest"],
suffix: str,
) -> Self:
msg = "join_asof is not yet supported on PyArrow tables" # pragma: no cover
raise NotImplementedError(msg)

for this to be accurate?

This is correct, though we may be able to detect this type of blanket NotImplementedErrors through the AST- though there will be some large edges that will appear

β‘  AST

from inspect import getsource
from textwrap import dedent
from ast import parse, NodeVisitor

from narwhals._arrow.dataframe import ArrowDataFrame

class Visitor(NodeVisitor):
    def __init__(self):
        self.has_notimplemented = False
        self.has_return = False
        super().__init__()

    def visit_Return(self, node):
        self.has_return = True
    def visit_Raise(self, node):
        if node.exc.func.id == 'NotImplementedError':
            self.has_notimplemented = True


source = dedent(getsource(ArrowDataFrame.join_asof))
tree = parse(source)
v = Visitor()
v.visit(tree)

print(f'{v.has_notimplemented = }')
print(f'{v.has_return         = }')

if v.has_notimplemented and not v.has_return:
    print('This method is likely not implemented')

# v.has_notimplemented = True
# v.has_return         = False
# This method is likely not implemented

β‘‘ Not Implemented Decorator (store on method)

Something more reliable may be to implement a NotImplemented decorator that attaches this metadata either to the method itself or registers it in some globally accessible place for later lookups.

from functools import wraps

def not_implemented(msg):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            raise NotImplementedError(msg)
        wrapper.not_implemented = True # store info on the method itself
        return wrapper
    return decorator


class T:
    @not_implemented('Not available in ...')
    def f(self):
        pass

    def g(self):
        pass

t = T()
# t.f()

print(
    f"{hasattr(T.f, 'not_implemented') = }",
    f"{hasattr(T.g, 'not_implemented') = }",
    sep='\n'
)


T.f()
hasattr(T.f, 'not_implemented') = True
hasattr(T.g, 'not_implemented') = False

Traceback (most recent call last):
  File "/home/cameron/.vim-excerpt", line 32, in <module>
    T.f()
  File "/home/cameron/.vim-excerpt", line 8, in wrapper
    raise NotImplementedError(msg)
NotImplementedError: Not available in ...

β‘’ Not Implemented Decorator (store in registry)

from functools import wraps

def not_implemented(msg):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            raise NotImplementedError(msg)

        not_implemented.registry.add(wrapper)
        return wrapper

    if not hasattr(not_implemented, 'registry'):
        not_implemented.registry = set()
    return decorator


class T:
    @not_implemented('Not available in ...')
    def f(self):
        pass

    def g(self):
        pass

t = T()

print(
    f'{not_implemented.registry        = }',
    f'{T.f in not_implemented.registry = }',
    f'{T.g in not_implemented.registry = }',
    sep='\n',
    end='\n'*2
)

t.f()
not_implemented.registry        = {<function T.f at 0x76976cc3a200>}
T.f in not_implemented.registry = True
T.g in not_implemented.registry = False

Traceback (most recent call last):
  File "/home/cameron/.vim-excerpt", line 36, in <module>
    t.f()
  File "/home/cameron/.vim-excerpt", line 8, in wrapper
    raise NotImplementedError(msg)
NotImplementedError: Not available in ...

@MarcoGorelli
Copy link
Member

hmm nice! if v.has_notimplemented and not v.has_return: is simpler than I was expecting it to be

maybe we can indeed do this! it may well be quite convenient for users

Comment on lines +388 to +449
T = TypeVar("T")
Namespace = TypeVar("Namespace")


class MetaProperty(Generic[T, Namespace]):
def __init__(
self, func: Callable[[T], Namespace], class_value: type[Namespace]
) -> None:
self._class_value = class_value
self._inst_method = func

@overload
def __get__(self, instance: None, owner: type[T]) -> type[Namespace]: ...
@overload
def __get__(self, instance: T, owner: type[T]) -> Namespace: ...
def __get__(self, instance: T | None, owner: type[T]) -> Namespace | type[Namespace]:
if instance is None:
return self._class_value
return self._inst_method(instance)


def metaproperty(
returns: type[Namespace],
) -> Callable[[Callable[[T], Namespace]], Namespace]: # TODO(Unassigned): Fix typing
"""Property decorator that changes the returned value when accessing from the class.
Arguments:
returns: The object to return upon class attribute accession.
Returns:
metaproperty descriptor.
Arguments:
returns: The object to return upon class attribute accession.
Returns:
A decorator that applies the custom metaproperty behavior.
Examples:
>>> from narwhals.utils import metaproperty
>>> class T:
... @property
... def f(self):
... return 5
...
... @metaproperty(str)
... def g(self):
... return 5
>>> t = T()
>>> assert t.f == t.g # 5
>>> assert isinstance(T.f, property)
>>> assert T.g is str
"""

def wrapper(
func: Callable[[T], Namespace],
) -> Namespace: # TODO(Unassigned): Fix typing
return MetaProperty(func, returns) # type: ignore[return-value]

return wrapper
Copy link
Member

@dangotbanned dangotbanned Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is the same, or just a similar issue - but reading this reminded me of an issue with DType.

The way that the nested types work seems like its unsafe, but isn't flagged by a type checker.
Using Array as an example, it has attributes annotated in class-scope (but not as a ClassVar), without a default.

narwhals/narwhals/dtypes.py

Lines 788 to 797 in 66628a5

inner: DType | type[DType]
size: int
shape: tuple[int, ...]
def __init__(
self: Self,
inner: DType | type[DType],
shape: int | tuple[int, ...] | None = None,
) -> None:
inner_shape: tuple[int, ...] = inner.shape if isinstance(inner, Array) else ()

Accessing on type[Array] would cause a runtime issue - but won't give a warning statically.

if dtype == dtypes.Datetime or isinstance(dtype, dtypes.Datetime):
dt_time_unit: TimeUnit = getattr(dtype, "time_unit", "us")
dt_time_zone = getattr(dtype, "time_zone", None)
return pl.Datetime(dt_time_unit, dt_time_zone) # type: ignore[arg-type]
if dtype == dtypes.Duration or isinstance(dtype, dtypes.Duration):
du_time_unit: TimeUnit = getattr(dtype, "time_unit", "us")
return pl.Duration(time_unit=du_time_unit) # type: ignore[arg-type]
if dtype == dtypes.List:
return pl.List(narwhals_to_native_dtype(dtype.inner, version, backend_version)) # type: ignore[union-attr]
if dtype == dtypes.Struct:
return pl.Struct(
fields=[
pl.Field(
name=field.name,
dtype=narwhals_to_native_dtype(field.dtype, version, backend_version),
)
for field in dtype.fields # type: ignore[union-attr]
]
)
if dtype == dtypes.Array: # pragma: no cover
size = dtype.size # type: ignore[union-attr]
kwargs = {"width": size} if backend_version < (0, 20, 30) else {"shape": size}
return pl.Array(
inner=narwhals_to_native_dtype(dtype.inner, version, backend_version), # type: ignore[union-attr]
**kwargs,
)
return pl.Unknown() # pragma: no cover

For comparison, Datetime does provide warnings - and doesn't have class-scoped attributes:

narwhals/narwhals/dtypes.py

Lines 491 to 500 in 66628a5

def __init__(
self: Self,
time_unit: TimeUnit = "us",
time_zone: str | timezone | None = None,
) -> None:
if time_unit not in {"s", "ms", "us", "ns"}:
msg = (
"invalid `time_unit`"
f"\n\nExpected one of {{'ns','us','ms', 's'}}, got {time_unit!r}."


I guess what I'm thinking is having something like polars has classinstmethod.
But for properties that have a default (accessible on the type) - without leaking to instances.

I feel like this might be similar to what you have here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pinging @MarcoGorelli to make sure you're aware of the potential bug here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants